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Abstract 

A  key  reason  for  the  great  difficulty  of  writing,  testing,  and  veri¬ 
fying  parallel  programs  is  the  need  to  reason  simultaneously  about 
not  only  the  sequential  correctness  of  each  part  of  a  program  in  iso¬ 
lation.  but  also  about  all  possible  nondeterministic  interleavings  of 
the  program’s  parallel  threads.  Thus,  there  has  been  much  interest 
in  techniques  for  separately  testing  or  verifying  the  correctness  of 
a  program’s  use  of  parallelism,  allowing  the  program’s  functional 
correctness  to  be  tested  or  verified  in  a  sequential  way. 

Nondeterministic  Sequential  (NDSeq)  specifications  have  been 
proposed  as  a  means  for  achieving  this  decomposition  in  testing, 
debugging,  and  verifying  a  program’s  parallelism  correctness  and 
its  sequential  functional  correctness.  An  NDSeq  specification  for 
a  parallel  program  is  a  sequential  version  of  the  program,  with  no 
parallel  threads  but  with  some  limited,  controlled  nondeterminism. 
A  program’s  use  of  parallelism  is  correct  if,  for  every  execution  of 
the  parallel  program,  there  exists  an  execution  of  the  NDSeq  speci¬ 
fication  producing  the  same  result.  Functional  correctness  can  then 
be  checked  on  this  NDSeq  specification,  without  any  interleaving 
of  parallel  threads. 

While  NDSeq  specifications  have  been  used  successfully  to 
check  parallelism  correctness,  manually  writing  NDSeq  specifica¬ 
tions  for  programs  can  be  a  challenging  and  time-consuming  pro¬ 
cess.  Thus,  in  this  work,  we  propose  a  technique  for  automatically 
inferring  a  likely  NDSeq  specification  for  a  parallel  program.  Given 
a  few  representative  executions  of  a  parallel  program,  our  technique 
combines  dynamic  data  flow  analysis  and  Minimum-Cost  Boolean 
Satisfiability  (MinCostSAT)  solving  to  infer  a  likely  NDSeq  spec¬ 
ification  for  the  program’s  parallelism.  We  have  implemented  our 
technique  for  Java  in  a  prototype  tool  called  NDETERMIN.  For  a 
number  of  benchmarks,  our  tool  infers  equivalent  or  stronger  ND¬ 
Seq  specifications  than  those  previously  written  manually. 

1.  Introduction 

As  multicore  and  manycore  processors  become  increasingly  com¬ 
mon,  more  and  more  programmers  must  write  parallel  software. 
But  writing  such  parallel  software  can  be  difficult  and  error  prone. 
In  addition  to  reasoning  about  the  sequential  correctness  of  each 
component  of  a  program  in  isolation,  a  programmer  must  simul¬ 
taneously  consider  whether  multiple  components  running  in  paral¬ 
lel,  their  threads  interleaving  nondeterministically,  can  harmfully 
interfere  with  one  another.  The  need  to  deal  simultaneously  with 
the  correctness  of  a  program's  parallelism  and  its  often-sequential 
functional  correctness  greatly  complicates  writing,  testing,  debug¬ 
ging,  and  verifying  parallel  programs. 

We  have  previously  proposed  nondeterministic  sequential  (ND¬ 
Seq)  specifications  [4]  as  a  way  to  separately  address  the  correct¬ 
ness  of  a  program’s  parallelism  and  a  program's  sequential  func¬ 
tional  correctness.  The  key  idea  is  to  use  a  sequential  version  of  a 


structured-parallel  program  as  a  specification  for  the  program’s  par¬ 
allel  behavior.  That  is,  the  program’s  use  of  parallelism  is  correct 
if,  for  every  parallel  execution  of  the  program,  their  exists  an  exe¬ 
cution  of  the  sequential  version  that  produces  the  same  result.  But, 
to  capture  all  intended  behaviors  of  a  parallel  program,  the  sequen¬ 
tial  version  may  have  to  include  some  limited,  controlled  nondeter¬ 
minism.  Thus,  a  programmer  specifies  the  intended  or  algorithmic 
nondeterminism  in  their  parallel  application,  and  the  nondetermin¬ 
istic  sequential  (NDSeq)  specification  is  a  sequential  version  of  the 
program,  with  no  interleaving  of  parallel  threads  but  with  this  anno¬ 
tated  nondeterminism.  Any  additional  nondeterminism  is  an  error, 
due  to  unintended  interference  between  interleaved  parallel  threads, 
such  as  data  races  or  atomicity  violations.  Further,  the  functional 
correctness  of  a  parallel  program  can  then  be  tested,  debugged,  and 
verified  sequentially  on  the  NDSeq  specification,  without  any  need 
to  reason  about  the  uncontrolled  interleaving  of  parallel  threads. 

In  [4],  we  also  proposed  a  sound  runtime  technique  for  checking 
that  a  structured-parallel  program  conforms  to  its  NDSeq  specifi¬ 
cation.  This  technique  was  able  to  check  the  parallelism  correct¬ 
ness  of  a  number  of  complex  Java  benchmark  programs.  How¬ 
ever,  writing  NDSeq  specifications  for  parallel  programs  can  be  a 
time-consuming  and  challenging  process,  especially  to  a  program¬ 
mer  unfamiliar  with  such  specifications.  A  programmer  can  easily 
forget  to  include  some  correct  nondeterministic  behaviors,  forcing 
them  to  iterate  between  checking  their  program  against  its  NDSeq 
specification  and  inspecting  violating  executions  in  order  to  add 
such  missing  nondeterminism  to  the  specification. 

In  this  work,  we  propose  a  technique  for  automatically  infer¬ 
ring  a  likely  NDSeq  for  a  structured-parallel  program.  Given  a 
representative  set  of  correct  parallel  executions,  plus  some  sim¬ 
ple  annotations  about  which  program  locations  contain  the  final 
program  result,  our  algorithm  infers  a  NDSeq  specification  with 
a  minimal  amount  of  added  nondeterminism  necessary  to  cap¬ 
ture  all  behavior  seen  in  the  observed  parallel  executions.  Our  in¬ 
ference  algorithm  combines  dynamic  data-flow  analysis,  conflict- 
serializability  checking,  and  Minimum-Cost  Boolean  Satisfiability 
(MinCostSAT)  solving. 

We  have  implemented  our  NDSeq  specification  inference  al¬ 
gorithm  in  a  prototype  tool  for  Java,  called  NDETERMIN.  We  ap¬ 
plied  NDETERMIN  to  the  set  of  Java  benchmarks  for  which  we  pre¬ 
viously  hand-wrote  NDSeq  specifications  for  the  work  presented 
in  [4],  and  NDETERMIN  correctly  inferred  all  these  hand-written 
specifications.  This  provides  promising  preliminary  evidence  that 
NDETERMIN  can  infer  correct  and  useful  NDSeq  specifications  for 
parallel  applications. 

We  believe  that  automatically  inferring  NDSeq  specifications 
can  save  programmer  time  and  effort  in  applying  NDSeq  specifica¬ 
tions.  In  particular,  we  believe  that  using  an  inferred  specification  as 
a  starting  point  is  much  simpler  than  writing  the  whole  specification 
from  scratch.  Further,  our  inference  algorithm  can  detect  parallel 


1:  coforeach  (i  in  1 . N)  { 

2:  bool  done  =  false; 

3:  while  (!done)  { 

4: 

5:  int  prev  =  x; 

6 :  int  curr  =  i  *  prev  +  i ; 

7:  bool  c  =  CAS(x,  prev,  curr); 

8:  if  (c)  { 

9 :  done  =  true ; 

IS:  } 

11:  }  } 

Figure  1.  Simple  parallel  program  to  perform  the  reduction  in 
line  6  for  the  integers  ,N|,  in  some  arbitrary  order. 


behaviors  that  no  possible  NDSeq  specification  would  allow,  which 
often  contain  parallelism  bugs.  More  generally,  such  inferred  spec¬ 
ifications  can  aid  in  understanding  and  documenting  a  program's 
parallel  behavior.  Finally,  inferring  NDSeq  specifications  is  a  step 
towards  an  automated  approach  to  testing  and  verification  of  paral¬ 
lel  programs  by  decomposing  parallelism  and  sequential  functional 
correctness.  In  such  an  approach,  a  program’s  parallelism  would  be 
checked  against  its  inferred  NDSeq  specification,  while  functional 
correctness  would  be  checked  sequentially  on  the  NDSeq  specifica¬ 
tion  using  any  of  a  wide  variety  of  powerful  techniques  for  testing 
and  verifying  sequential,  nondeterministic  programs. 

2.  Overview 

In  this  section,  we  give  an  overview  of  our  algorithm  for  inferring 
NDSeq  specifications  for  parallel  programs  on  a  simple  example. 
We  first  present  some  background  on  NDSeq  specifications. 

2.1  Motivating  Example 

Consider  the  simple  parallel  program  in  Figure  1 .  The  program  con¬ 
sists  of  a  parallel  for-loop,  written  as  coforeach — each  iteration  of 
this  loop  attempts  to  perform  a  computation  (Line  6)  on  shared  vari¬ 
able  x,  which  is  initially  0.  Each  iteration  uses  an  atomic  compare- 
and-swap  (CAS)  operation  to  update  shared  variable  x.  If  multiple 
iterations  try  to  concurrently  update  x,  some  of  these  CAS’s  will  fail 
and  those  parallel  loop  iterations  will  recompute  their  updates  to  x 
and  try  again. 

Consider  the  parallel  execution  shown  in  Figure  3.  In  this  exe¬ 
cution,  the  i=l  iteration  reads  and  computes  an  updated  value  for 
shared  variable  x.  But  before  the  i=l  iteration  can  update  x,  the  i=2 
iteration  (in  another  thread)  runs  and  sets  x  to  2.  The  first  compare- 
and-swap  (CAS)  operation  in  the  i=l  iteration  then  fails,  and  the 
iteration  redoes  its  computation  before  successfully  updating  x. 

2.2  Background:  Nondeterministic  Sequential  Specifications 

The  problem  addressed  by  nondeterministic  sequential  (NDSeq) 
specifications  is  how  to  reason  about  the  correctness  of  the  parallel 
program  in  the  presence  of  thread  interleavings.  If  we  can  specify 
the  full  functional  correctness  of  a  program — i.e.,  for  our  motivat¬ 
ing  example,  specify  precisely  which  final  values  of  x  are  correct 
for  each  input  value  of  x — then  this  specification  will  clearly  im¬ 
ply  that  the  parallelization  of  the  program  was  correct.  Although  it 
may  look  straightforward  to  write  such  a  formal  specification  for 
our  motivating  example,  we  believe  that  it  will  be  a  very  difficult 
task  for  many  large  and  complex  programs. 

Instead  of  testing  or  verifying  a  parallel  program  directly  against 
a  functional  specification,  we  would  like  to  separate  this  end-to- 
end  reasoning  into  two  simpler  tasks:  (1)  the  checking  of  whether 
the  program  is  parallelized  correctly  independent  of  the  complex 
functional  correctness  and  (2)  the  checking  of  whether  the  program 


1:  nd-foreach  (i  in  1 . N)  { 

2:  bool  done  =  false; 

3:  while  (!done)  { 

4:  if  (*)  { 

5:  int  prev  =  x; 

6 :  int  curr  =  i  *  prev  +  i ; 

7:  bool  c  =  CAS(x,  prev,  curr); 

8:  if  (c)  { 

9 :  done  =  true ; 

IS:  } 

11:  }  I  } 

Figure  2.  A  nondeterministic  sequential  specification  for  the 
program  in  Figure  1. _ 


satisfies  a  functional  correctness  criteria  independent  of  any  inter¬ 
leaving  of  threads. 

A  natural  approach  to  specifying  parallelism  correctness  would 
be  to  specify  that  the  program  in  Figure  1  must  produce  the  same 
final  value  for  x  as  a  version  of  the  program  with  all  parallelism 
removed — i.e.,  the  entire  code  is  executed  by  a  single  thread.  (Note 
that,  this  condition  is  independent  of  which  final  values  of  x  are 
correct,  which  is  specified  and  checked  separately  as  the  functional 
correctness  of  the  program.)  However,  in  this  case  we  do  not  get  a 
sequential  program  equivalent  to  the  parallel  program.  For  exam¬ 
ple,  the  parallel  program  in  Figure  1  is  free  to  execute  the  compu¬ 
tations  at  line  6  in  any  nondeterministic  order.  Thus,  for  the  same 
input  value  of  x.  different  thread  schedules  can  produce  different 
values  for  x  at  the  end  of  the  execution.  On  the  other  hand,  exe¬ 
cuting  the  loop  sequentially  from  1  to  N  will  always  produce  the 
same,  deterministic  final  value  for  x.  Suppose  that  such  extra  non¬ 
deterministic  behaviors  due  to  thread  interleavings  are  intended;  the 
challenge  here  is  how  to  express  these  nondeterministic  behaviors 
in  a  sequential  specification. 

We  addressed  this  challenge  in  [4]  by  introducing  a  specifica¬ 
tion  mechanism  that  the  programmer  can  use  to  declare  the  in¬ 
tended,  algorithmic  notions  of  nondeterminism  in  the  form  of  a  se¬ 
quential  program.  Such  a  nondeterministic  sequential  specification 
(NDSeq)  for  our  example  program  is  shown  in  Figure  2.  This  spec¬ 
ification  is  intentionally  very  close  to  the  actual  parallel  program, 
but  its  semantics  is  sequential  with  two  nondeterministic  aspects. 
First,  the  nd-foreach  keyword  in  line  1  specifies  that  the  loop  iter¬ 
ations  can  run  in  any  permutation  of  the  set  1,  . . . ,  N.  This  part  of 
the  specification  captures  the  intended  nondeterministic  behavior 
of  the  program,  caused  in  the  parallel  program  by  running  threads 
with  arbitrary  schedules.  Second,  the  if(*)  keyword  in  line  4  spec¬ 
ifies  that  the  iteration  body  may  be  skipped  nondeterministically, 
at  least  from  a  partial  correctness  point  of  view;  this  is  acceptable, 
since  the  loop  in  this  program  fragment  is  already  prepared  to  deal 
with  the  case  when  the  effects  of  an  iteration  are  ignored  follow¬ 
ing  a  failed  CAS  statement.  In  summary,  all  the  final  values  of  x 
output  by  the  parallel  program  in  Figure  1  can  be  produced  by  a 
feasible  execution  of  the  NDSeq  specification  in  Figure  2.  Then, 
we  say  that  the  parallel  program  obeys  its  NDSeq  specification. 
In  fact,  we  have  developed  a  sound  algorithm  [4]  that  checks  for 
a  given  representative  interleaved  execution  trace  r  of  the  parallel 
program,  whether  there  exists  an  equivalent,  feasible  execution  of 
the  NDSeq  specification. 

2.3  Inferring  NDSeq  Specifications 

The  key  difficultly  with  the  previous  approach  is  that  writing  such 
specifications,  and  especially  the  placement  of  the  if(*)  con¬ 
structs,  can  be  difficult  in  many  practical  situations.  If  we  place 
too  few  if(*)  constructs,  we  may  not  be  able  to  specify  some  in¬ 
tended  nondeterministic  behaviors  in  the  parallel  code.  However, 


i=l 


Figure  3.  A  parallel  execution  of  two  iterations  (i=l,2)  of  the 
example  parallel  program  front  Figure  1 .  The  vertical  order  of 
events  shows  the  interleaving.  Each  assignment  shows  in  paren¬ 
theses  the  value  being  assigned.  The  thin  dotted  arrows  denote 
data  dependencies  between  events.  The  thick  solid  arrows  de¬ 
note  transactional  conflicts. 


if  we  place  too  many  if(*)  constructs,  or  if  we  place  them  in  the 
wrong  places,  the  specification  might  allow  too  much  nondetermin¬ 
ism,  which  will  likely  violate  the  intended  functionality  of  the  code. 

Our  contribution  in  this  paper  is  to  give  an  algorithm,  running 
on  a  set  of  input  execution  traces,  for  inferring  a  minimal  nondeter- 
ministic  sequential  specification  such  that  the  checking  approach 
described  in  [4]  on  the  input  traces  succeeds.  Choosing  a  minimal 
specification — i.e.,  with  a  minimal  number  of  iff*),  is  a  heuristic 
that  makes  it  more  likely  that  the  inferred  specification  matches  the 
intended  behavior  of  the  program.  Our  key  idea  is  to  reformulate 
our  runtime  checking  algorithm  in  [4]  (explained  below)  as  a  con¬ 
straint  solving  and  optimization  problem,  in  particular  a  Minimum 
Cost  Boolean  Satisfiability  (MinCostSAT)  problem. 

Runtime  Checking  Parallelism  Correctness  Consider  the  paral¬ 
lel  execution  shown  in  Figure  3.  Our  algorithm  in  [4]  checks  if  this 
trace  can  be  serialized  with  respect  to  the  NDSeq  specification — i.e. 
whether  the  final  result  (the  value  of  the  shared  variable  x)  can  be 
obtained  by  running  the  loop  iterations  sequentially  in  some  non- 
deterministic  order.  For  our  example  trace,  the  algorithm  discovers 
the  serialization  in  Figure  4.  This  serialization  is  a  witness  to  the 
correctness  of  the  parallelism  in  the  trace  in  Figure  3. 

The  algorithm  in  [4]  determines  whether  or  not  such  an  NDSeq 
execution  exists  by  generalizing  conflict-serializability  [27]  check¬ 
ing.  We  now  describe  conflict-serializability  on  our  motivating  ex¬ 
ample  to  show  why  and  how  conflict-serializability  must  be  gen¬ 
eralized  for  checking  NDSeq  specifications.  Given  a  collection  of 
transactions — in  this  case,  we  think  of  each  parallel  loop  iteration 
as  a  transaction — we  form  the  conflict  graph  whose  vertices  are  the 


i=2 


Figure  4.  A  serialization  of  the  parallel  execution  in  Figure  3. 
This  serialization  becomes  a  witness  that  there  exists  an  equiva¬ 
lent  execution  of  the  NDSeq  specification  in  Figure  2.  The  ver¬ 
tical  order  of  events  shows  the  interleaving.  Each  assignment 
shows  in  parentheses  the  value  being  assigned.  The  thick  solid 
arrows  denote  transactional  conflicts. 


transactions  and  with  a  conflict  edge  from  transaction  tr  to  tr'  if  tr 
and  tr'  contain  conflicting  operations  op  and  op'  with  op  happening 
before  op' .  Two  operations  from  different  threads  are  conflicting  if 
they  operate  on  the  same  shared  global  and  at  least  one  of  them  is 
a  write;  in  Figure  3  and  4  the  conflicts  are  shown  with  thick  solid 
arrows.  It  is  a  well-known  result  [27]  that  if  there  are  no  cycles  in 
the  conflict  graph,  then  the  transactions  are  serializable. 

Because  the  conflict  arrows  from  the  i=l  iteration  to  the  i=2 
iteration  (from  eg  to  eg)  and  vice  versa  (e.g.,  from  eg  to  ei3,  ei6,  or 
ei8)  form  a  cycle,  these  two  iterations  are  not  conflict-serializable. 
Yet,  this  execution  trace  is  serializable,  since  its  result  is  the  same 
as  if  we  run  first  the  iteration  i=2  followed  by  i=l.  Therefore,  we 
need  a  more  general  notion  than  conflict-serializability. 

In  order  to  report  this  execution  serializable,  we  must  be  able  to 
show  that  all  conflict  cycles  between  iteration  i=l  and  i=2  can  be 
safely  ignored.  For  this,  we  perform  a  dynamic  data  flow  analysis 
and  use  the  if(*)  in  the  program's  NDSeq  specification  in  this 
analysis.  In  particular,  we  need  to  identify  relevant  events  in  the 
traces:  (1)  final  writes  to  the  shared  variable  x,  and  (2)  all  events  on 
which  events  in  (1)  are  (transitively)  dependent.  Then,  we  check  if 
there  is  any  conflict  cycle  formed  by  only  relevant  events;  we  can 
safely  ignore  the  cycles  that  contain  irrelevant  events. 

Computing  Relevant  Events  When  computing  the  set  of  relevant 
events,  we  consider  all  data  dependencies  between  events  (shown 
with  thin  dotted  arrows  in  Figure  3).  For  the  trace  in  Figure  3,  we 
first  include  events  eg  and  eis  in  the  relevant  events,  as  both  write 
to  shared  variable  x.  We  then  include  ez,  eg,  ei6,  and  eiz,  as  eg  and 
eis  are  data  dependent  on  these  events. 


The  way  we  consider  control  dependencies  is  subtle.  By  default, 
a  deterministic  branch  event  is  considered  relevant  and  all  events 
that  flow  into  its  branch  condition  become  relevant.  For  example, 
in  Figure  3,  the  events  e2,  e6,  eio,  ei2,  ei9,  and  e2i  are  considered 
relevant,  and  we  include  events  ei  and  es,  as  e2  and  e%  are  data 
dependent  on  the  writes  of  local  variable  done.  Exceptionally,  a 
branch  event  can  be  considered  irrelevant  if  that  event  is  executed 
by  a  statement  s  enclosed  within  iff*)  in  the  program’s  NDSeq 
specification  and  all  the  events  generated  by  that  execution  of 
s  are  irrelevant.  Intuitively,  this  means  that  in  the  corresponding 
execution  of  the  NDSeq  specification,  that  particular  execution  of 
s  can  be  entirely  ignored  without  affecting  the  final  outcome  of 
the  execution  (by  considering  that  the  nondeterministic  iff*)  will 
be  resolved  to  if(false)  in  the  corresponding  NDSeq  execution). 
Therefore,  when  inferring  the  NDSeq  specification,  we  need  to  look 
for  statements  to  add  if(*)  so  that  we  can  ignore  events  that  are 
involved  in  conflict  cycles.  In  the  presence  of  the  data  dependencies 
between  events,  this  becomes  a  combinatorial  search  problem. 

In  order  to  show  that  the  execution  Figure  3  is  serializable,  we 
need  to  ignore  the  conflict  cycles  formed  by  the  thick  solid  arrows 
in  the  figure.  For  this,  possible  candidate  events  to  ignore  are:  (1) 
the  read  e3,  (2)  the  write  eg,  or  (3)  all  three  of  ei3,  ei6,  and  eig.  But, 
since  the  events  e9,  ei6,  and  eis  affect  the  computation  of  x,  they 
are  relevant  for  the  final  result  of  the  trace  and  they  could  not  be 
eliminated  in  a  matching  serialization  even  if  they  were  guarded  by 
if(*).  Thus,  our  inference  algorithm  must  focus  on  placing  if(*) 
around  events  e3  and  ei3. 

If  we  enclose  an  if(*)  around  lines  5-10  as  shown  in  Figure  2, 
we  can  safely  mark  event  ei4  irrelevant,  because  the  branch  ei4  cor¬ 
responds  to  is  not  evaluated,  and  thus,  does  not  affect  the  rest  of  the 
execution.  This  also  makes  the  events  e3,  e4,  and  ei3  irrelevant  be¬ 
cause  these  events  flow  into  only  each  other  and  ei4.  Therefore,  we 
can  ignore  the  events  e3,  e4,  ei3,  and  ei4  together  with  the  conflict 
cycles  they  are  involved  in.  In  fact,  the  only  conflict  cycles  in  the 
execution  are  formed  by  the  events  e3  and  ei3,  and  after  ignoring 
these  cycles,  we  can  declare  the  execution  in  Figure  3  serializable. 
Serializing  this  execution  respecting  the  remaining  conflict  edges 
gives  us  the  execution  trace  in  Figure  4.  This  trace  can  also  be  gen¬ 
erated  by  a  nondeterministic  execution  of  the  NDSeq  specification 
given  in  Figure  2  by  choosing  false  for  iff*)  in  the  first  iteration 
of  i=l. 

MinCostSAT  Solving  for  if(*)  Placements  Having  explained 
how  a  given  parallel  execution  trace  is  checked  against  an  existing 
NDSeq  specification,  we  next  explain  how  to  infer  such  a  speci¬ 
fication.  For  this,  we  observe  a  set  of  representative  parallel  exe¬ 
cution  traces  for  which  the  standard  conflict  serializability  check 
gives  conflict  cycles.  Since  we  are  inferring  an  NDSeq  specifica¬ 
tion  for  the  program,  not  for  a  single  trace,  using  multiple  traces 
allows  us  to  observe  variations  in  the  executions  and  improves  the 
reliability  of  the  inferred  NDSeq  specification. 

We  then  construct  and  solve  a  MinCostSAT  formula  that  takes 
as  input  the  events  in  the  input  traces  and  the  conflict  cycles  de¬ 
tected  by  the  standard  conflict  serializability  check.  While  generat¬ 
ing  the  formula,  we  encode  the  reasoning  about  relevant  events  and 
conflict  cycles  described  above  as  constraints  in  the  formula.  In 
particular,  the  constraints  enforce  the  data  dependencies  between 
the  events  and  conditions  to  ignore  all  observed  conflict  cycles  in 
the  input  traces.  The  MinCostSAT  formulation  contains  variables 
corresponding  to  possible  placement  of  if(*)s  in  the  program.  If 
this  formula  is  satisfiable,  then  the  solution  gives  us  a  minimal  set 
of  statements  S*  in  the  program,  such  that  the  input  traces  are  all 
serializable  with  respect  to  the  NDSeq  specification  obtained  by 
enclosing  all  statements  in  S*  with  if(*). 

Constraints  are  added  to  impose  a  number  of  conditions: 


1.  For  each  cycle  of  transactional  conflicts,  at  least  one  of  the 
events  involved  in  the  cycle  must  be  made  irrelevant.  For  ex¬ 
ample.  we  would  add  constraint  (Xe3  V  Xeg  V  Xei3)  for  the 
cycle  between  the  i=l  and  i=2  iterations  by  conflicts  e3  i— >  e9 
and  e9  h- >  ei3.  This  constraint  enforces  that  at  least  one  of  the 
variables  Xe3,  Xeg ,  and  Xei3  be  1  in  the  solution. 

2.  Each  event  e  can  be  made  irrelevant  only  if  all  events  that  are 
data  or  control  dependent  on  e  are  also  irrelevant.  For  example, 
e3  can  be  made  irrelevant  only  if  e4,  ei3,  and  ei4  are  made 
irrelevant,  as  well.  For  example,  (Xe3  =*-  Xe4 )  is  among 
the  constraints  added  to  model  this  requirement.  The  constraint 
enforces  that  whenever  Xe3  is  1  in  the  solution,  Xe4  be  also  1 
in  the  same  solution. 

3.  For  each  event  e,  we  add  a  constraint  indicating  that  e  is  made 
irrelevant  only  if  some  iff*)  is  added  such  that  both:  (a)  some 
dynamic  instance  of  the  if(*)  contains  e,  and  (b)  no  event 
contained  by  that  dynamic  instance  is  relevant. 

For  example,  an  if(*)  around  line  5,  lines  5-7,  or  lines  5-10 
would  make  ez  irrelevant,  because  none  of  events  e4,  ei3,  or 
ei4  (which  depend  on  es)  are  relevant.  But  an  if(*)  around  the 
entire  while  statement  would  not,  because  the  dynamic  if(*) 
containing  e3  would  also  contain  the  relevant  event  eis. 

4.  Finally,  we  forbid  adding  overlapping  iff*)  constructs.  For 
example,  we  forbid  adding  both  an  iff*)  around  lines  5  and 
6  and  one  around  lines  6  and  7,  as  this  would  not  be  a  well- 
structured  program. 

These  constraints  allow  any  solution  that  covers  all  of  lines  5- 
10.  and  no  more,  with  some  number  of  if(*)  constructs.  (Because 
events  e3,  e4,  ei3,  and  644  all  must  be  made  irrelevant,  and  any 
larger  iff*)  including  these  events  would  include  relevant  events). 
The  minimal  such  solution  places  a  single  if(*)  that  encloses 
lines  5-10.  Thus,  our  algorithm  produces  the  correct  NDSeq  speci¬ 
fication  for  this  example. 

3.  Background:  NDSeq  Specifications 

In  this  section,  we  formalize  the  condition  for  a  parallel  program 
to  satisfy  its  NDSeq  specification,  and  formally  describe  our  ap¬ 
proach  in  [4]  to  checking  this  condition.  For  this,  we  will  assume 
that  we  are  given  an  NDSeq  specification  a  priori.  We  show  in  the 
next  section  how  to  infer  such  a  specification  with  minimal  nonde¬ 
terminism. 

3.1  NDSeq  Specifications  and  Parallelism  Correctness 

Given  a  parallel  program  V,  we  specify  the  correct  behavior  of  a 
parallel  program  by  writing  an  equivalent  nondeterministic  sequen¬ 
tial  (NDSeq)  program  as  a  specification  of  V,  instead  of  explicitly 
giving  a  specification  of  the  required  input-output  behavior.  Infor¬ 
mally,  the  equivalence  means  that  for  any  input  and  thread  schedule 
of  V  there  exists  an  execution  of  the  NDSeq  program  that  produces 
the  same  output.  We  formalize  these  concepts  next  after  describing 
the  language  constructs  that  we  use  to  write  parallel  programs  and 
their  specifications. 

We  will  follow  the  approach  in  [4]  where  the  specification  is 
embedded  in  the  parallel  program  itself.  This  embedding  enables 
one  to  readily  apply  standard  conflict  serializability  checking  tech¬ 
niques  to  verify  the  program  against  its  NDSeq  specification.  The 
syntax  for  the  language  is  shown  in  Figure  5.  To  simplify  the  pre¬ 
sentation  we  consider  a  NDSeq  program  V  to  consist  of  a  sin¬ 
gle  procedure.  We  omit  the  discussion  of  multiple  procedures  and 
object-oriented  concepts,  and  assume  that  each  global  variable  (in 
Global )  refers  to  a  distinct  location  on  the  shared  heap,  and  each  lo¬ 
cal  variable  (in  Local)  refers  to  a  distinct  stack  location  of  a  thread. 


g  £  Global  l  £  Local  x  £  Var  =  Global  U  Local 

s  £  Stmt  ::=  l  =  l  op  l  \  l  =  constant  |  1  =  1  \  g  =  l  \  l  =  g  \  s;s 

|  if  (0  s  else  s  |  while(t)  s  |  foreach  ( l  in  l )  s 
|  coforeach  (l  in l)  s  |  cobegin 
|  atomic  s  |  if(*)  s 

Figure  5.  Selected  statements  of  our  language.  The  constructs 
with  a  different  semantics  in  the  parallel  program  and  the  NDSeq 
specification  are  shown  in  gray  color. 


Given  a  parallel  program  V,  the  NDSeq  specification  is  ob¬ 
tained  by  (a)  overloading  the  parallel  constructs  that  create  threads 
(coforeach  and  cobegin)  in  a  sequential  context,  and  (b)  introduc¬ 
ing  nondeterministic  control  flow  with  if(*).  Specifically,  given  an 
statement  s  in  the  parallel  program,  the  user  can  modify  the  state¬ 
ment  to  if  (*)  {s}  in  the  NDSeq  specification.  For  each  program  V , 
given  a  set  of  statements  to  enclose  with  if(*),  we  define  two  sets 
of  executions  ParExecs(V)  and  NdSeqExecs(P ),  described  below. 
The  correctness  of  the  parallel  program  is  then  given  by  relating 
ParExecs(V)  and  NdSeqExecs(V). 

Parallel  executions:  ParExecs(V)  contains  the  parallel  execu¬ 
tions  of  V  where  each  cobegin  and  coforeach  statement  creates 
implicitly  new  threads  to  execute  its  body,  cobegin  is 

evaluated  by  executing  each  of  Si, ...,  sn  on  a  separate,  newly  cre¬ 
ated  thread,  coforeach  is  evaluated  by  executing  each  iteration 
of  the  loop  on  a  separate,  newly  created  thread.  Following  struc¬ 
tured  fork/join  parallelism,  a  parallel  execution  of  a  cobegin  and 
coforeach  statement  terminates  only  after  all  the  threads  created  on 
behalf  of  the  statement  terminate.  Assignments,  branch  conditions, 
and  the  entire  atomic  statement  are  executed  as  atomic  steps  with¬ 
out  being  interrupted  by  other  threads.  Note  that,  if(*){s}  state¬ 
ments  are  introduced  in  the  NDSeq  specification  and  do  not  appear 
in  the  parallel  program. 

NDSeq  executions:  NdSeqExecs(V)  contains  the  (nondetermin¬ 
istic)  sequential  executions  of  V  where  all  statements  are  evaluated 
by  a  single  thread.  Under  the  sequential  semantics,  the  statements 
other  than  cobegin  and  coforeach  are  interpreted  in  the  standard 
way.  Statement  atomic  s  is  simply  equivalent  to  s.  Each  evalua¬ 
tion  of  cobegin  si;  ...;sn  is  equivalent  to  running  a  nondetermin¬ 
istic  permutation  of  statements  si, ...,  s„.  A  statement  coforeach 
is  evaluated  similarly  to  its  deterministic  version  (foreach)  except 
that  the  elements  of  the  collection  being  iterated  over  are  processed 
in  a  nondeterministic  order.  This,  in  essence,  abstracts  the  seman¬ 
tics  of  the  collection  to  an  unordered  set.  Finally,  if(*)  indicates  a 
nondeterministic  branch.  That  is,  each  time  a  statement  if(*){s} 
is  evaluated,  a  boolean  value  is  chosen  for  *  nondeterministically. 

The  parallelism  correctness  for  V  means  that  every  final  state 
reachable  by  a  parallel  execution  of  the  program  from  a  given  initial 
state  is  also  reachable  by  a  NDSeq  execution  from  the  same  initial 
state.  Therefore,  parallel  executions  have  no  unintended  nondeter¬ 
minism  caused  by  thread  interleavings:  either  the  nondeterminism 
is  prevented  using  synchronization,  or  it  is  expressed  by  the  nonde¬ 
terministic  control  flow  in  the  sequential  specification. 

While  defining  correctness,  we  distinguish  a  set  of  global  vari¬ 
ables  as  focus  variables,  which  are  considered  to  be  effective  on 
the  functionality  of  the  program.  Then,  we  reason  about  the  equiv¬ 
alence  executions  by  referring  to  the  final  valuation  of  the  focus 
variables.  For  example,  consider  a  parallel  search  algorithm.  The 
variable  pointing  to  the  best  (optimal)  solution  found  is  a  focus 
variable,  while  statistics  counters,  which  do  not  affect  the  final  out¬ 
come  of  the  search,  are  non-focus  variables. 


Definition  1  (Safe  parallel  execution).  A  parallel  execution  in 
ParExecs(V)  of  a  program  V  is  safe  with  respect  to  a  set  Focus  C 
Global,  if  there  exists  a  sequential  execution  in  NdSeqExecsfP), 
such  that  the  initial  states  are  the  same  in  both  executions  and  their 
final  states  agree  on  the  value  of  all  variables  g  £  Focus. 

Definition  2  (Parallelism  Correctness).  A  program  V  obeys  its 
NDSeq  specification  with  respect  to  a  set  Focus  C  Global,  if  every 
parallel  execution  ofV  is  safe  with  respect  to  Focus. 

3.2  Conflict  Serializability  Checking  of  Parallel  Executions 

A  key  motivation  behind  keeping  the  specification  of  a  parallel 
program  similar  to  the  parallel  program  is  that  we  can  readily 
apply  standard  conflict  serializability  checking  techniques  to  verify 
the  program  against  its  NDSeq  specification.  We  briefly  describe 
conflict  serializability  checking  [12,  14,  27]  because  we  will  be 
using  such  checking  in  our  proposed  inference  algorithm. 

We  assume  that  the  parallel  program  is  free  of  low-level  data 
races  [26].  We  check  conflict  serializability  on  an  execution  trace 
described  as  a  sequence  of  execution  events.  For  each  event  e  we 
have  the  following  information: 

Type(e)  is  the  type  of  the  event,  defined  as  follows: 

T  ::=  x  =  x  \  branch(l) 

The  “x  =  x'”  event  type  corresponds  to  the  assignment  and  binary 
operation  statements  in  our  language  (shown  in  Figure  5;  recall  that 
metavariable  x  stands  for  both  locals  and  globals).  We  use  a  simple 
assignment  in  our  formal  description  to  simplify  the  presentation; 
unary  and  binary  operators  do  not  pose  notable  difficulties.  We 
assume  that  an  event  can  read  a  global,  or  write  a  global,  but 
not  both.  The  “branch(fj”  event  marks  the  execution  of  a  branch 
operation  when  the  boolean  condition  denoted  by  local  l  evaluated 
to  true.  A  branch  event  can  be  generated  by  the  statements  of 
the  form  while(t)  s  and  if  (Z)  s.  The  case  of  a  branch  when  the 
negation  of  a  local  is  true  is  similar.  Our  algorithm  does  not  require 
specific  events  to  mark  the  start  and  end  of  procedures  or  atomic 
blocks.  We  write  e  :  T  when  e  has  type  T. 

Thread(e)  denotes  the  thread  that  generates  the  event  e.  Recall  that 
a  new  thread  is  created  for  each  dynamic  instance  of  a  block  of  a 
cobegin  statement  and  for  each  iteration  of  a  coforeach  statement. 

First  we  define  the  conflict  relation  between  individual  events 
with  respect  to  a  set  of  events,  denoted  £ .  For  traditional  conflict 
serializability  checking  £  is  instantiated  to  the  set  of  all  events  in  a 
trace.  Flowever,  we  will  overload  this  parameter  in  the  subsequent 
sections. 

Definition  3  (Conflicting  events  in  a  set  of  events  £).  Given  a 
set  of  events  £,  two  events  e,  e'  £  r  are  conflicting  in  £  ( written 
e  er)  iff  (a)  e,  e'  £  £,  and  (b)  e  occurs  before  e'  in  r,  and  (c) 
both  events  operate  on  the  same  shared  global  variable,  and  at  least 
one  of  them  represents  a  write,  and  (d)  the  events  are  generated  by 
different  threads. 

Next  we  lift  the  conflict  relation  from  events  to  threads.  When 
comparing  two  threads  for  conflicts  we  need  to  consider  their 
events  and  all  the  events  of  their  descendant  threads.  Thus,  for  a 
thread  t  we  define  its  transaction  as  the  set  of  events  Trans  ( t )  that 
includes  all  the  events  of  t  and  of  the  descendant  threads  of  t. 

Definition  4  (Conflicting  threads  with  respect  to  a  set  of  events  £). 
Given  a  set  of  events  £,  two  threads  t,  t!  are  conflicting  in  trace 
t  (written  t  t' )  iff  (a)  their  transaction  sets  are  disjoint  (i.e., 
one  is  not  a  descendant  of  the  other),  and  (b)  there  exist  two  events 
e  £  Trans  ( t )  and  e '  £  Trans  ( t ')  that  are  conflicting  (e  e' ). 
The  relation  t  t'  is  the  transitive  and  reflexive  closure  of  the 
thread  conflict  relation. 


Given  a  parallel  execution  trace  r  and  a  subset  of  events  £  of 
r.  the  runtime  conflict  serializability  checking  algorithm  works  as 
follows.  We  compute  conflict  relation  between  threads  from  r. 
If  there  exists  a  cycle  t  t'  t  (where  t  t'),  then  we  report 
that  the  trace  is  not  conflict-serializable;  otherwise,  we  declare  that 
the  execution  is  safe.  Intuitively,  conflict  serializability  relies  on 
the  fact  that  an  interleaved  trace  can  be  transformed  incrementally 
into  a  serialized  trace  by  a  sequence  of  swaps  of  adjacent  events. 
From  the  definition  of  a  conflict,  it  is  always  safe  to  swap  two 
adjacent  non-conflicting  events  without  changing  the  final  state  of 
the  execution,  and  a  cycle  of  conflicts  indicates  that  a  serialized 
trace  cannot  be  obtained  by  swapping  non-conflicting  events. 

The  following  theorem  states  that  conflict  serializability  pro¬ 
vides  a  sound  way  of  checking  parallel  executions  against  a  NDSeq 
specification.  (The  proof  of  the  theorem  relies  on  classical  results 
from  the  database  theory  [27].) 

Theorem  1.  Let  t  be  a  parallel  execution  ofP.  If  the  above  run¬ 
time  conflict  serializability  checking  algorithm  (where  £  contains 
all  events  in  t)  does  not  report  any  non-serializable  transaction, 
then  t  is  safe  with  respect  to  its  NDSeq  specification. 

3.3  Improving  Conflict  Serializability  Checking 

A  standard  conflict-serializability  algorithm  [27]  considers  all 
events  in  a  trace.  (Note  that,  Theorem  1  holds  when  we  instantiate 
£  to  the  set  of  all  events  of  the  trace.)  However,  in  many  concurrent 
programs  it  is  common  that  results  of  some  computations  access¬ 
ing  shared  variables  are  discarded  and  are  not  relevant  to  the  rest  of 
the  execution.  In  Section  2  we  demonstrated  such  a  case,  in  which 
partial  work  based  on  a  previous  read  of  shared  variable  x  was  dis¬ 
carded  when  another  conflicting  access  to  x  was  later  detected.  In¬ 
formally,  an  event  is  relevant  if  it  writes  to  a  memory  location  that 
is  eventually  used  in  the  computation  of  a  final  value  of  a  focus 
variable  or  a  deterministic  (while(i)  or  i£(Z))  branch  taken  in  the 
execution.  We  showed  that  if  we  perform  traditional  serializability 
checking  of  a  trace  considering  only  the  relevant  events  in  the  trace, 
we  can  safely  ignore  conflict  cycles  [4],  Below  we  formally  define 
the  set  of  relevant  events;  in  Section  4  we  show  how  to  compute 
this  set  by  constraint  solving. 

Let  5*  be  the  set  of  statements  that  are  immediately  enclosed 
with  if(*)  in  the  NDSeq  specification  of  the  program.  Given  a 
trace  r,  we  denote  the  set  of  relevant  events  in  r  with  respect  to 
the  NDSeq  specification  by  Relevant(r,S*).  In  order  to  define 
Relevant  formally,  we  need  some  notation. 

Dynamic  Data  Dependence:  To  track  the  relevance  aspect  we 
compute  a  dynamic  data-dependence  relation  between  events.  For 
trace  r,  we  define  the  data-dependence  relation  —*T  as  follows: 
D1  (Data  Dependence).  For  each  local  variable  read  ej  :  x  =  l 
or  branch  e-j  :  branch(l ),  we  add  a  dependence  (e;  —*T  ej)  on 
the  last  a  :  l  =  x'  that  comes  before  ej  in  r.  This  dependence 
represents  an  actual  data  flow  through  local  l  from  a  to  ej  in  the 
current  trace.  Both  of  these  events  are  in  the  same  thread  (since  they 
operate  on  the  same  local)  and  their  order  and  dependence  will  be 
the  same  in  any  serialization  of  the  trace.  These  dependence  edges 
are  shown  as  thin  dashed  arrows  in  Figure  3. 

D2  (Inter-Thread  Dependence).  For  each  global  variable  read 
ej  :  l  =  g  we  add  dependencies  (a  —*T  e:>)  on  events  e,  :  g  =  l' 
as  follows.  From  each  thread  we  pick  the  last  write  to  g  that  comes 
before  e:1  in  r,  and  the  first  write  to  g  that  comes  after  e,  in  r.  This 
conservative  dependence  is  necessary  because  the  relative  order  of 
reads  and  writes  to  a  global  variable  from  different  threads  may 
change  in  a  serialization  of  the  trace.  In  this  way,  dependencies  are 
preserved  while  reordering  the  accesses. 

Let  denote  the  transitive  closure  of  —*T.  We  omit  the 
subscripts  when  r  is  clear  from  context. 


We  write  Es  to  denote  the  set  that  contains  exactly  the  events 
generated  by  a  dynamic  instance  of  s  in  some  execution.  Let 
NdBlock(e)  return  the  smallest  set  Es  such  that  e  £  Es  and 
s  £  S*.  In  other  words,  NdBlock(e)  gives  the  execution  of  the 
smallest  statement  that  generated  e  and  is  enclosed  with  if(*)  in 
the  NDSeq  specification  of  the  program.  If  e  is  not  generated  by  a 
statement  enclosed  with  if(*),  then  NdBlock(e)  is  undefined. 

Definition  5  (Relevant  events).  Let  t  be  a  parallel  execution  trace. 
Relev  ant  (t,S*)  is  the  smallest  set  of  events  from  r  such  that 
e  £  Relevant(r,S*),  if  one  of  the  following  holds: 

R1  e  :  g  =  l,  and  e  is  the  last  write  to  g  £  Focus  in  Thread(e). 

R2  e  :  branch(l),  and  NdBlock(e)  is  undefined. 

R3  e  :  branch(l),  NdBlock(e)  =  Es,  and  there  is  an  event 

e'  £  Es  such  that  e'  £  Relevant (r,S*). 

R4  Exists  an  event  e '  £  Relevant(r,S*)  such  that  e  --+*  e' . 

(Note  that,  in  this  case  e  is  an  assignment  event.) 

R1  makes  all  final  writes  to  focus  variables  relevant.  R2  and 
R3  state  the  condition  for  a  (deterministic)  branch  event  e  to  be¬ 
come  relevant:  either  if  e  is  not  part  of  any  execution  of  a  statement 
enclosed  with  if(*)  in  the  NDSeq  specification,  or  if  the  small¬ 
est  such  execution  generating  e  already  contains  another  relevant 
event.  R4  makes  an  event  relevant  if  it  flows  into  another  relevant 
event.  Notice  that,  a  dynamic  instance  of  a  statement  s  £  S*  be¬ 
comes  totally  irrelevant,  if  it  does  not  contain  a  final  write  to  a  fo¬ 
cus  variable  and  none  of  the  events  generated  by  that  instance  flow 
(through  ---»)  into  other  relevant  events  outside  the  instance. 

The  following  theorem,  proved  in  [4],  states  that  it  is  sound  to 
check  conflict  serializability  by  only  considering  relevant  events  in 
the  trace.  Here,  we  substitute  £  with  Relevant  (r,  5*)  in  Defini¬ 
tion  3  and  4  rather  than  all  the  events  in  the  trace. 

Theorem  2  (Soundness).  Let  t  be  the  trace  of  a  parallel  execution 
of  a  program  V.  If  the  runtime  conflict  serializability  checking 
algorithm  only  considering  Relevant  (t,S*)  does  not  report  any 
non-serializable  transaction,  then  the  parallel  execution  is  safe 
with  respect  to  the  NDSeq  specification  ofV  and  Focus. 

4.  Inferring  a  Suitable  NDSeq  Specification 

Having  explained  the  runtime  approach  to  reasoning  about  whether 
a  program  obeys  its  NDSeq  specifications  and  the  role  of  if(*) 
annotations  in  this,  the  next  question  is  where  users  should  add 
such  if(*)  annotations?  In  this  paper,  we  are  not  going  to  describe 
where  to  insert  such  annotations.  A  list  of  recipes  for  inserting  such 
annotations  can  be  found  our  previous  work  [4],  In  this  work,  we 
are  going  to  figure  the  annotations  automatically.  In  particular,  our 
goal  is,  given  a  set  of  traces  T  of  V,  to  come  up  with  a  set  5* 
of  statements  so  that  if  we  immediately  enclose  the  statements 
in  S *  with  if(*)  and  compute  the  relevant  events  in  T  (as  in 
Definition  5),  conflict  serializability  checking  over  these  relevant 
events  gives  no  serializability  violations,  showing  that  all  the  traces 
in  T  are  safe  with  respect  to  the  NDSeq  specification  obtained 
from  S*.  We  assume  that  the  user  still  indicates  which  variables 
constitute  the  focus  variables. 

In  order  to  infer  the  NDSeq  specification,  we  compute  the 
set  of  relevant  events  by  solving  a  Boolean  Satisfiability  prob¬ 
lem  (SAT).  In  contrast  to  the  original  algorithm  for  computing 
Relevant (t,S*)  in  [4],  formulating  our  reasoning  as  a  constraint 
solving  problem  allows  us  to  not  only  check  that  a  given  trace  is 
safe.  It  also  enables  us  to  perform  such  checking  without  giving  an 
NDSeq  specification.  Instead,  we  ask  the  SAT  solver  to  find  a  suit¬ 
able  set  S*  of  statements  to  enclose  with  iff*),  so  that  the  solver 
can  compute  a  proper  set  of  relevant  events  using  which  it  can  show 
that  the  traces  in  T  are  all  safe. 


We  also  need  to  be  careful  with  the  set  S*,  because  if(*)s  could 
add  extra  behaviors  to  the  NDSeq  specification  of  the  program  and 
those  extra  behaviors  should  not  violate  the  functional  correctness 
of  the  program.  Therefore,  we  need  to  find  the  minimal  set  5*  that 
is  necessary  to  show  that  the  traces  in  T  are  all  safe.  For  this,  we 
then  turn  the  problem  into  a  Minimum-Cost  Boolean  Satisfiability 
Problem  (MinCostSAT). 

4.1  SAT  Formulation  for  Inferring  NDSeq  Specification 

We  start  with  a  program  V  where  a  set  Focus  of  focus  variables 
are  marked  by  the  programmer,  but  no  statement  is  enclosed  with 
if(*).  i.e..  the  set  S *  is  empty.  We  are  given  a  set  T  of  parallel 
execution  traces  and  a  set  of  conflict  cycles  detected  by  applying 
a  standard  conflict  serializability  check  on  each  trace.  Theorem  1 
and  2  imply  that  if  a  conflict  serializability  check,  over  all  events 
or  only  the  events  in  Relevantfir,  S*)  where  S*  =  0,  respectively, 
finds  no  conflict  cycles  in  a  trace  r,  then  r  is  safe.  Thus,  we  assume 
that  each  trace  in  T  contains  at  least  one  conflict  cycle.  Our  goal 
is  to  determine  if  there  exist  an  NDSeq  specification  with  a  non¬ 
empty  set  S*  of  statements  to  enclose  with  if(*)  using  which  we 
can  show  that  all  the  conflict  cycles  can  be  ignored  safely,  i.e.,  each 
cycle  contains  at  least  one  relevant  event.  For  this,  we  construct  and 
solve  a  SAT  instance  on  the  following  (boolean)  indicator  variables: 

•  X3,  for  each  statement  s  mV. 

•  Xe,  for  each  event  e  in  any  of  the  traces  in  T.  Note  that  these 
dynamic  events  are  uniquely  identified,  both  in  a  trace  and 
across  all  traces. 

•  Xes,  for  each  dynamic  execution  of  a  statement  s  generating 
exactly  the  events  in  Es  in  any  of  the  traces  in  T. 

Let  X  denote  a  solution  to  a  SAT  instance.  We  refer  to  the  values 
(from  the  set  {0, 1})  of  indicator  variables  Xe,  Xs .  and  X  e,  in 
solution  X  by  Xe,  X.,,  and  X®s ,  respectively. 

We  will  construct  our  constraints  to  guarantee  that,  if  our  in¬ 
stance  has  a  solution  X,  then  there  exists  an  NDSeq  specification 
for  V  with  respect  to  which  all  traces  in  T  are  safe .  and  the  follow¬ 
ing  hold: 

1.  The  set  S*  =  {s  j  Xs  =  1}  contains  the  statements  we  need  to 
surround  with  if(*)  in  the  inferred  NDSeq  specification. 

2.  For  each  event  e  in  a  trace  t  £  T,  if  Xe  =  1  then  event  e  is 
irrelevant  in  r,  i.e.,  e  0  Relevant (r,S*)  with  respect  to  the 
inferred  NDSeq  specification. 

3.  For  each  Xes,  if  X^s  =  1.  then  Xs  =  1,  and  thus,  statement 
s  will  be  enclosed  with  an  if(*),  and  that  if(*)  around  s  will 
make  events  in  E  all  irrelevant  (i.e.,  Xe  =  1  for  all  e  £  E). 

4.  For  each  conflict  cycle  C  in  some  trace  r  £  T,  Xe  =  1  for  at 
least  one  event  e  in  C.  This  means,  cycle  C  will  not  be  observed 
when  checking  conflict  serializability  over  Relevant (r,S*) 
computed  using  the  set  S*  given  by  solution  X. 

We  construct  the  full  SAT  instance  as  follows.  The  conditions 
R1  through  R4  are  from  Definition  5  of  relevant  events.  For  sim¬ 
plicity,  we  use  implications  in  constraints;  each  constraint  can  be 
trivially  translated  to  the  clause  form  (disjunction  of  literals)  by  us¬ 
ing  the  equivalence  (X\  ==>  Xf)  =  (~Xi  V  X2). 

(A)  Condition  R1  dictates  that  if  an  event  e  is  a  final  write  to  a 
variable  in  Focus,  then  e  is  relevant.  Thus,  for  each  such  event 
e,  Xe  must  be  0  in  the  solution.  We  ensure  this  by  substituting 
0  for  each  such  Xe  in  the  rest  of  the  formulation. 

(B)  Conditions  R2  and  R3  dictate  that  a  branch  event  e  -i.e.,  of 
type  branch(l)  for  some  local  l-  becomes  irrelevant  only  if  (a) 


e  is  generated  by  a  dynamic  instance  of  a  statement  s  which  is 
directly  enclosed  by  an  if(*)  and  (b)  all  the  events  generated 
by  that  instance  are  irrelevant.  To  ensure  (a)  and  (b)  we  add  the 
following  constraints. 

•  For  each  branch  event  e  we  add  the  constraint: 

(A'e  =*■  \/  XEs)  (1) 

e£Es 

•  For  each  dynamic  instance  of  statement  s,  producing  events 
E,  we  add  the  constraint: 

(. XEs  =>  X3)  (2) 

•  For  each  dynamic  instance  of  statement  s,  producing  events 
E,  and  for  each  e  £  E,  we  add  the  constraint: 

(. XEs  ==►  Xe)  (3) 

Therefore,  if  a  branch  event  e  needs  to  be  irrelevant,  the  solver 
must  find  a  dynamic  instance  of  some  s  with  all  its  events 
(including  e)  are  irrelevant,  and  s  must  be  enclosed  with  if(*). 

(C)  Condition  R4  dictates  that  if  an  event  e'  is  relevant  and  if  there 

is  another  event  e  such  that  e  e! ,  then  e  must  also  be 

relevant.  In  other  words,  if  e  needs  to  be  irrelevant,  e!  must  be 
irrelevant,  too.  To  ensure  this,  we  add  the  following  constraint 
for  each  pair  of  events  e,  e!  such  that  e  --  +  *  e!  for  some  t: 

(Xe  =>  Xe>)  (4) 

(D)  Given  two  statements  s  and  s',  we  say  that  s  overlaps  with  s'  if 
s  is  not  nested  inside  s',  s'  is  not  nested  inside  s,  and  there  is  a 
statement  s"  nested  inside  both  s  and  s' . 

If  we  have  two  overlapping  statements,  then  we  cannot  surround 
both  of  them  by  if(*)  because  such  an  action  would  result  in 
an  invalid  program.  For  example,  consider  the  sequential  com¬ 
position  of  three  statements  si;  s2;  s3.  Then  statements  si;  s2 
and  s2;  s3  overlap  with  each  other  and  it  is  easy  to  see  that  we 
cannot  surround  both  of  them  with  if(*)  simultaneously.  Our 
constraint  system  ensures  this  restriction  by  adding  the  follow¬ 
ing  constraint  for  every  pair  of  overlapping  statements  s,  s': 

(Xs  =►  -X3,)  (5) 

(E)  Finally,  we  need  to  ensure  that  at  least  one  event  in  each  conflict 
cycle  must  be  irrelevant  so  that  we  can  use  Theorem  2  to 
conclude  that  all  the  traces  in  T  are  safe  with  respect  to  the 
inferred  NDSeq  specification.  To  ensure  this,  for  each  set  C 
of  events  forming  a  conflict  cycle  in  a  trace  r  £  T.  we  add 
constraint: 

V  (6) 

eec 

If  the  above  constraint  system  is  satisfiable,  then  a  solution  to 
it  gives  us  an  NDSeq  specification.  In  particular,  if  X  is  a  solution 
to  the  constraint  system,  then  S*  =  {s  |  Xs  =  1}  is  the  set  of 
statements  to  enclose  with  if(*)  in  the  NDSeq  specification.  Then, 
all  the  traces  in  T  are  safe  with  respect  to  the  inferred  specification. 

Theorem  3  (Inference).  Given  a  set  T  of  parallel  execution  traces 
of  program  V  and  focus  variables  Focus,  assume  that  our  SAT 
instance  is  satisfiable,  and  let  S*  be  the  set  of  statements,  inferred 
from  the  solution,  to  be  enclosed  with  if(*).  Then,  every  trace  in  T 
is  safe  with  respect  to  Focus  and  the  inferred  NDSeq  specification. 

We  give  the  proof  of  Theorem  3  in  Appendix  A. 

A  noteworthy  implication  of  this  theorem  is  that,  if  there  is 
no  way  to  show  that  a  trace  in  T  is  safe  by  only  adding  if(*)s 
to  statements,  the  solver  reports  that  the  solution  is  unsatisfiable. 


Thus,  an  unsatisfiable  instance  indicates  that  one  of  the  traces  in  T 
is  likely  to  contain  parallelism  errors,  such  as  atomicity  violations. 

4.2  MinCostSAT  Solving  for  a  Minimal  NDSeq  Specification 

The  SAT  formulation  above  guarantees  that  if  there  is  an  NDSeq 
specification,  in  which  a  set  S*  of  statements  are  enclosed  in  if(*), 
that  can  show  that  the  traces  in  T  are  safe,  then  there  exists  a 
solution  X  that  selects  S*.  i.e.,  forall  s  £  S*.  X.,  =  1.  But, 
we  have  not  guaranteed  that  such  a  solution  selects  exactly  S*.  In 
other  words,  the  solution  may  tell  us  to  enclose  with  if(*)  more 
statements  than  those  in  S*.  In  this  case,  there  is  a  risk  that  some 
statements  can  be  unnecessarily  surrounded  by  if  (*) .  Since  adding 
if(*)  may  cause  adding  more  behaviors  to  the  program,  one  could 
end  up  adding  if  (*)s  that  violate  functional  correctness.  Therefore, 
we  need  a  mechanism  to  find  a  solution  to  the  constraint  system 
so  that  functional  correctness  is  not  broken.  We  noticed  that  if  we 
add  a  minimal  number  of  if(*)s  then  there  is  a  lower  chance  of 
breaking  the  functional  correctness.  For  this,  we  re-formulate  the 
SAT  problem  above  as  a  MinCostSAT  problem. 

MinCostSAT  is  a  special  form  of  SAT,  where,  in  addition  to  the 
constrains  above,  a  cost  function  C  assigns  each  variable  a  non¬ 
negative  cost.  The  solver  is  asked  to  find  a  solution  that  not  only 
satisfies  all  the  constraints,  but  also  minimizes  the  sunt  of  the  costs 
of  the  variables  that  are  assigned  1  in  the  solution. 

Our  MinCostSAT  formulation  contains  all  the  constraints  (A)- 
(E)  given  above.  In  addition,  we  define  the  cost  function  C  such 
that  for  each  Xs,  C(A'S)  =  1,  and  for  other  variables  X., 
C(X.)  =  0.  Therefore,  the  solver  optimizes  the  objective  1 

minimize  Y  x« 

s  in  V 

In  this  formulation,  Xa  is  assigned  1  only  when  a  branch  event  e 
must  be  marked  irrelevant  to  discharge  a  conflict  cycle,  and  for  this, 
s  must  be  surrounded  with  if(*).  Otherwise,  Xs  is  assigned  0  to 
minimize  the  cost. 

Note  that,  adding  only  the  minimum  number  of  necessary 
if(*)s  to  the  inferred  NDSeq  specification  is  a  heuristic  to  reduce 
the  risk  of  violating  the  functional  specification.  In  other  words, 
if  we  find  a  solution  to  our  MinCostSAT  formulation  above,  then 
we  have  inferred  a  likely  NDSeq  specification  of  the  parallel  pro¬ 
gram,  and  that  specification  may  still  violate  the  functional  cor¬ 
rectness  specification  of  the  program.  If  we  find  no  solution,  then 
probably  there  is  no  NDSeq  specification  for  the  parallel  program. 
Thus,  we  foresee  a  repetitive  process  for  finding  the  right  NDSeq 
specification,  in  which  the  user  sequentially  checks  the  functional 
correctness  specification  (e.g.,  assertions)  after  inferring  an  NDSeq 
specification,  and  if  any  functional  correctness  criterion  is  violated, 
rules  out  the  current  placement  of  if(*)s  for  the  next  iteration  of 
NDSeq  specification  inference. 

4.3  Optimizations 

We  conclude  this  section  by  presenting  two  optimizations  that  we 
observed  to  have  significant  effect  in  simplifying  the  constraint 
system,  and  thus  reducing  the  MinCostSAT  solving  time  from 
minutes  to  several  seconds. 

Using  Dynamic  Slicing:  Recall  that  in  Constraint  (A)  we  pre¬ 
assign  0  to  each  Xe  if  e  is  a  final  write  to  a  focus  variable;  the 
values  for  indicator  variables  of  other  events  are  computed  during 
the  SAT  solving.  By  using  the  dynamic  slice  [1]  of  the  trace,  we 


1  This  formulation  can  also  be  mapped  to  a  Partial  Maximum  Satisfiability 
problem  (PMAX-SAT),  which  contains  our  constraints  in  (A)-(E)  as  hard 
constraints  and  for  each  variable  Xs  a  soft  constraint  (-uYs);  the  objective 
is  to  satisfy  all  hard  constraints  and  maximum  number  of  soft  constraints. 


can  improve  this  by  providing  values  for  more  variables  before  the 
SAT  solving. 

Let  °->T  denote  the  control  dependence  between  the  events  in 
trace  r,  and  — >*  denote  the  transitive  closure  of  (--■ *T  U  °->T). 
Recall  that  —*T  denotes  data  dependence  defined  in  Section  3.3, 
and  see  Appendix  B  for  the  formal  definition  of  ’—*T. 

A  dynamic  slice  of  a  trace  r  with  respect  to  the  focus  variables, 
denoted  by  DSlice(r),  is  the  set  of  events  from  r  such  that  e  £ 
DSlice(r)  iff  there  exists  an  event  e'  :  g  =  l  in  r,  e1  is  the  last 
write  to  g  £  Focus  in  Thread(e'),  and  e  — >*  e’ . 

Lemma  1.  V<S*.  DSlice(r )  C  Relevant(T,  S *). 

We  give  the  proof  of  the  lemma  in  Appendix  B. 

Lemma  1  indicates  that  given  an  input  trace  r  £  T  to  our 
MinCostSAT  formulation,  the  set  of  relevant  events  in  r  given  by 
a  solution  will  always  be  a  superset  of  the  dynamic  slice  of  r;  this 
result  holds  for  any  inferred  NDSeq  specification.  In  other  words, 
if  e  £  DSlice(r)  then  it  must  be  relevant.  Thus,  we  can  safely 
modify  (A),  which  pre-assigns  0  to  Xe  only  if  e  is  a  final  write  to  a 
focus  variable,  in  Section  4. 1  as  follows. 

(A’)  For  each  event  e  £  DSlice(r ),  Xe  must  be  0  in  the  solution. 

Thus,  we  substitute  0  for  each  such  e  in  the  constraint  system. 

Grouping  Events:  The  formulation  of  MinCostSAT  in  Section  4 
considers  all  the  events  in  the  execution  and  the  dependencies  be¬ 
tween  those  events.  This  could  lead  to  large  MinCostSAT  instances 
that  are  expensive  to  solve.  We  address  this  situation  by  grouping 
events  into  disjoint  sets.  Whenever  we  see  an  execution  of  a  state¬ 
ment  that  is  completely  excluded  from  the  dynamic  slice,  we  treat 
all  the  events  e  =  {ei, ...,  en}  in  that  dynamic  execution  instance 
of  the  statement  as  a  single  (compound)  event.  For  each  i  £  [1,  n], 
we  then  replace  the  variable  Xei  in  all  constraints  of  the  MinCost¬ 
SAT  by  the  variable  Xe  and  we  lift  the  constraints  described  in 
Section  4  to  sets  of  events.  In  this  way,  we  can  ignore  the  depen¬ 
dency  relationship  between  the  events  within  a  group  and  concen¬ 
trate  on  inter-group  dependencies.  Although  grouping  events  in  this 
way  may  result  in  less  optimal  solutions  (with  higher  cost  than  the 
original  and  more  general  formulation  in  Section  4),  in  our  experi¬ 
ments,  we  confirmed  that  it  does  not  affect  the  final  solution  for  our 
benchmarks. 

5.  Experimental  Evaluation 

In  this  section,  we  describe  our  efforts  to  experimentally  evaluate 
our  approach  to  inferring  likely  nondeterministic  sequential  (ND¬ 
Seq)  specifications  for  parallel  programs.  In  particular,  we  aim  to 
evaluate  the  following  claim:  By  examining  a  small  number  of  rep¬ 
resentative  executions,  our  specification  inference  algorithm  can 
automatically  generate  the  correct  set  of  if(*)  annotations  for  real 
Java  programs. 

To  evaluate  this  claim,  we  implemented  our  technique  in  a  pro¬ 
totype  tool  for  Java,  called  NDETERMIN,  and  applied  NDETERMIN 
tool  to  the  set  of  Java  benchmarks  for  which  we  have  previously 
and  manually  written  NDSeq  specifications  [4],  We  compared  the 
quality  and  accuracy  of  our  automatically-inferred  if(*)s  to  the 
ones  in  their  manually-written  NDSeq  specifications. 

Our  prototype  tool  NDETERMIN  uses  bytecode  instrumentation 
via  Soot  [32],  During  the  instrumentation  phase,  we  compute  the 
control  dependencies  between  statements  and  identify  the  candi¬ 
date  static  blocks  that  could  be  annotated  with  if(*).  While  Sec¬ 
tion  4  describes  our  algorithm  over  structured  statements,  our  im¬ 
plementation  handles  unstructured  statements  of  Java.  We  use  ac¬ 
tive  random  testing  to  generate  parallel  execution  traces,  and  use 
the  MinCostChaff  [17]  and  MiniSat+[9]  solvers  to  solve  the  Min¬ 
CostSAT  instances  generated  from  these  traces. 


Benchmark 

Benchmark  Description 

Approximate 
Lines  of  Code 
(App  +  Lib) 

#  of  Parallel 

Constructs 

Size  of 

NDSe 

#  of 

if(*)’s 

Manual 
:q  Spec 

#  focus 

stmts 

Size  ol 
(Eve 

All 

f  Trace 
:nts) 

Sliced 

Out 

Inferred 

Specif 

#  of 

if(*)’s 

1  NDSeq 
ication 

Correct? 

JGF 

sor 

successive  over-relaxation 

300 

1 

0 

1 

905k 

561k 

0 

yes 

matmult 

sparse  matrix-vector  multiplication 

700 

1 

0 

1 

962k 

8k 

0 

yes 

series 

coefficients  of  Fourier  series 

800 

1 

0 

5 

2008k 

1215 

0 

yes 

crypt 

encryption  and  decryption 

1100 

2 

0 

3 

493k 

100k 

0 

yes 

moldyn 

molecular  dynamics  simulation 

1300 

4 

0 

1 

4517k 

4300k 

0 

yes 

lufact 

LU  factorization 

1500 

1 

0 

1 

1048k 

792k 

0 

yes 

raytracer 

ray  tracing 

1900 

1 

0 

1 

9125k 

8960k 

- 

raytracer  (fixed) 

corrected  ray  tracing 

1900 

1 

0 

1 

9125k 

8960k 

0 

yes 

montecarlo 

Monte  Carlo  derivative  pricing 

3600 

1 

0 

1 

1723k 

731k 

0 

yes 

PJ 

pi3 

Monte  Carlo  approximation  of  tc 

150  +  15,000 

1 

0 

1 

1062k 

141 

0 

yes 

keysearch3 

cryptographic  key  cracking 

200  +  15,000 

2 

0 

4 

1062k 

1049k 

0 

yes 

mandelbrot 

fractal  (Mandelbrot  set)  rendering 

250  +  15,000 

1 

0 

6 

576k 

330k 

0 

yes 

phylogeny 

branch-and-bound  search 

4400  +  15,000 

2 

3 

8 

29k 

24k 

- 

phylogeny  (fixed) 

corrected  branch-and-bound  search 

4400  +  15,000 

2 

3 

8 

29k 

24k 

1 

yes 

stack 

Treiber  non-blocking  stack  [30] 

40 

1 

2 

8 

1050 

356 

2 

yes 

queue 

non-blocking  queue  [25] 

60 

1 

2 

8 

325 

114 

2 

yes 

meshrefine 

Delaunay  mesh  refinement 

1000 

1 

2 

50 

930k 

845k 

2 

yes 

Table  1.  Experimental  results.  All  if  (*)  annotations  inferred  by  our  tool  were  verified  manually  to  be  correct. 


Limitations:  In  Java,  it  is  necessary  to  handle  language  features 
such  as  objects,  exceptions,  casts,  etc.  While  our  implementation 
supports  many  intricacies  of  the  Java  language,  it  has  a  couple 
of  limitations.  First,  our  implementation  tracks  neither  the  shared 
reads  and  writes  made  by  uninstrumented  native  code,  nor  the  flow 
of  data  dependence  through  such  native  code.  Second,  in  order  to 
reduce  the  runtime  overhead,  our  tool  does  not  instrument  all  of  the 
Java  standard  libraries.  Thus,  we  could  miss  conflicts  or  data  depen¬ 
dencies  carried  out  through  the  native  code  and  the  Java  libraries, 
and  fail  to  include  some  events  in  our  inference  algorithm.  To  ad¬ 
dress  the  second  limitation,  for  certain  shared  data  structure  objects 
we  introduced  shared  variables  and  inserted  reads  or  writes  of  those 
variables  whenever  their  corresponding  objects  were  accessed.  This 
allowed  us  to  conservatively  approximate  the  conflicts  and  data  de¬ 
pendencies  for  certain  critical  standard  Java  data  structures.  We  did 
not  observe  any  inaccuracy  in  our  experimental  results  due  to  these 
limitations. 

5.1  Experimental  Setup 

The  names,  sizes,  and  brief  descriptions  of  the  benchmarks  we  used 
to  evaluate  NDETERMIN  are  listed  in  Table  1.  Several  benchmarks 
are  from  the  Java  Grande  Forum  (JGF)  benchmark  suite  [29]  and 
the  Parallel  Java  (PJ)  Library  [20],  For  our  benchmarks,  we  use  the 
same  focus  variable  and  parallel  region  annotations  as  in  [4],  (Al¬ 
though  these  benchmarks  are  written  in  a  structured  parallel  style, 
they  use  explicit  Java  threads  as  Java  does  not  provide  cobegin 
or  coforeach  constructs.  Thus,  the  code  was  annotated  to  indi¬ 
cate  which  regions  of  code  correspond  to  the  bodies  of  structured 
cobegin's  or  coforeach’s.) 

Note  that  benchmarks  raytracer  and  phylogeny  both  con¬ 
tain  parallelism  errors.  Thus,  we  apply  NDETERMIN  to  both  the 
original  version  of  each  benchmark,  and  a  version  in  which  the  er¬ 
ror  has  been  fixed.  (For  raytracer,  we  modify  a  synchronized 
block  to  use  the  correct  shared  lock  to  protect  the  key  global 
variable  checksuml.  For  phylogeny,  we  make  one  method 
synchronized  in  order  to  eliminate  an  atomicity  error.) 

We  execute  each  benchmark  five  times  on  a  single  test  input, 
using  a  simple  implementation  of  race-directed  active  random  test¬ 
ing  [28],  For  each  benchmark,  NDETERMIN  analyzes  all  five  exe¬ 
cutions  and  either  infers  a  placement  of  iff*)  for  the  benchmark's 
NDSeq  specification  or  reports  that  the  benchmark  satisfies  no  pos¬ 
sible  NDSeq  specification  due  to  a  parallel  error. 

We  performed  our  experiments  on  a  64-bit  Linux  machine  with 
a  dual  Quad-Core/HT  Intel(R)  Xeon(R)  CPU  (2.67GHz)  processor. 


24MB  L3  cache  and  48GB  of  DDR3/1066  RAM.  For  each  experi¬ 
ment,  we  measured  the  time  for  solving  both  SAT  instances  (with¬ 
out  minimizing  the  number  of  if(*)s)  using  ZChaff  and  MiniSat, 
and  MinCostSAT  instances  using  MinCostChaff  and  MiniSat+  gen¬ 
erated  during  the  experiment.  We  observed  that  for  the  benchmarks 
that  do  not  require  nondeterministic  branches  in  their  NDSeq  spec¬ 
ifications,  the  solving  time  for  SAT  and  MinCostSAT  are  very  close 
to  each  other,  as  the  solver  can  satisfy  all  the  constraints  with¬ 
out  needing  to  optimize  the  number  of  if(*)s.  For  most  of  the 
benchmarks,  the  solving  time  was  in  terms  of  milliseconds,  and  few 
large  benchmarks  required  several  seconds  to  solve  the  constraints. 
Therefore,  in  the  following  we  report  on  the  quality  and  accuracy 
of  the  inferred  specifications. 

5.2  Experimental  Results 

The  results  of  our  experimental  evaluation  are  summarized  in  Ta¬ 
ble  1.  The  column  labeled  ‘All",  under  “Size  of  Trace  (Events)”,  re¬ 
ports  the  number  of  total  events  seen  in  the  last  execution  (of  five) 
of  each  benchmark,  and  the  column  labeled  “Sliced  Out”  reports 
the  number  of  events  removed  by  our  dynamic  slicing.  NDETER¬ 
MIN  searches  for  if(*)  placements  to  eliminate  cycles  of  transac¬ 
tional  conflicts  involving  sliced  out  events. 

The  second-to-last  column  of  Table  1  reports  the  number  of 
if(*)  constructs  in  the  inferred  NDSeq  specification  for  each 
benchmark.  We  manually  determined  whether  each  of  the  inferred 
if(*)  annotations  was  correct — i.e.,  captures  all  intended  nonde¬ 
terminism,  so  that  the  parallel  program  is  equivalent  to  its  NDSeq 
specification,  but  no  extraneous  nondeterminism  that  would  allow 
the  NDSeq  version  of  the  program  to  produce  functionally  incor¬ 
rect  results.  All  of  the  inferred  specifications  were  correct. 

For  many  of  the  benchmarks,  NDETERMIN  correctly  infers  that 
no  if(*)  constructs  are  necessary.  All  but  one  of  these  benchmarks 
are  simply  conflict-serializable.  As  discussed  in  [4],  montecarlo 
is  not  conflict-serializable,  but  the  non-serializable  conflicts  afftect 
neither  the  control-flow  nor  the  final  result  of  the  program. 

For  benchmarks  stack,  queue,  and  meshrefine,  NDETER¬ 
MIN  infers  an  NDSeq  specification  exactly  equivalent  to  our  man¬ 
ual  specifications  from  [4],  That  is,  NDETERMIN  infers  the  same 
number  of  if(*)  constructs  and  places  them  in  the  same  locations 
as  in  previous  manually-written  NDSeq  specifications.  We  note  that 
NDETERMIN  finds  specifications  slightly  smaller  than  the  manual 
ones,  which  include  a  small  number  of  adjacent  statements  in  the 
if  (*)  that  do  not  strictly  need  to  be  enclosed,  although  in  each  case 


the  overall  behavior  of  the  NDSeq  specification  is  the  same  whether 
or  not  these  statements  are  included  in  the  if(*). 

Further,  for  benchmark  phylogeny  (fixed),  while  the  previ¬ 
ous  manual  NDSeq  specification  included  three  if(*)  constructs, 
NDETERMIN  correctly  infers  that  only  one  of  these  three  is  actually 
necessary.  The  extra  if(*)  appear  to  have  been  manually  added  to 
address  some  possible  parallel  conflicts  that,  in  fact,  can  never  be 
involved  in  non-serializable  conflict  cycles.  That  is,  these  two  ex¬ 
traneous  iff*)  allow  the  NDSeq  specification  to  perform  several 
nondeterministic  behaviors  seen  during  parallel  execution  of  the 
benchmark.  But  NDETERMIN  correctly  determines  that  these  be¬ 
haviors  are  possible  in  the  NDSeq  specification  even  without  these 

Note  that  for  two  benchmarks,  raytracer  and  phylogeny, 
NDETERMIN  correctly  reports  that  no  NDSeq  specification  (i.e., 
no  solution  to  the  SAT  instance)  exists  (indicated  by  in  Ta¬ 
ble  1).  That  is,  NDETERMIN  detects  that  the  events  of  the  dynamic 
slice  (i.e.,  those  not  removed  by  dynamic  slicing)  are  not  conflict- 
serializable.  These  conflicts  exist  because  both  benchmarks  contain 
parallelism  errors  (atomicity  errors  due  to  insufficient  synchroniza¬ 
tion).  As  a  result  of  these  errors,  these  two  parallel  applications  can 
produce  incorrect  results  that  no  sequential  version  could  produce. 

Discussion:  These  experimental  results  provide  promising  pre¬ 
liminary  evidence  for  our  claim  that  NDETERMIN  can  automat¬ 
ically  check  serializability  by  way  of  inferring  if(*)  necessary 
for  the  NDSeq  specification  of  parallel  correctness  for  real  paral¬ 
lel  Java  programs.  We  believe  adding  nondeterministic  if(*)  con¬ 
structs  is  the  most  difficult  piece  of  writing  a  NDSeq  specification, 
and  thus  our  inference  technique  can  make  using  NDSeq  specifica¬ 
tions  much  easier.  Further,  such  specification  inference  may  allow 
for  fully-automated  testing  and  verification  to  use  NDSeq  specifi¬ 
cations  to  separately  address  parallel  and  functional  correctness. 

6.  Related  Work 

Several  parallel  correctness  criteria,  including  data-race  free¬ 
dom  [26],  atomicity  [16],  linearizability  [19],  and  determinism  [2, 
5]  have  been  studied  for  shared  memory  parallel  programs  that  sep¬ 
arate  the  reasoning  about  functionality  and  parallelism  at  different 
granularities  of  execution.  All  these  criteria  provide  the  separation 
between  parallel  and  functional  correctness  partially,  as  the  restric¬ 
tion  on  thread  interleavings  is  limited,  for  example,  to  atomic  block 
boundaries.  NDSeq  specifications  [4]  develop  this  idea  up  to  a  com¬ 
plete  separation  between  parallelism  and  functionality  so  that  the 
programmer  can  reason  about  the  intended  functionality  by  exam¬ 
ining  a  sequential,  but  nondeterministic,  program. 

Reasoning  about  conflicting  accesses  that  are  simultaneously 
enabled  but  ineffective  on  the  rest  of  the  execution  is  the  main  chal¬ 
lenge  in  both  static  [7,  16,  31]  and  dynamic  [3,  14,  21,  23,  33,  34] 
techniques  for  checking  atomicity  and  linearizability.  The  Purity 
work  [13]  and  QED  [10]  provide  static  analyses  to  rule  out  spuri¬ 
ous  warnings  due  to  such  conflicts  by  abstracting  these  operations 
to  no-op’s.  Their  abstraction  techniques  resemble  identifying  irrel¬ 
evant  events  by  a  dependency  analysis.  However,  lack  of  dynamic 
information  during  the  static  verification  is  a  bottleneck  in  automat¬ 
ing  their  overall  approach 

DETERMIN  [6]  infers  likely  semantic  determinism  specifica¬ 
tions  [5]  for  parallel  programs.  This  is  done  by  monitoring  and  an¬ 
alyzing  program  states  during  parallel  executions  of  the  program 
and  generating  pre-  and  post-  bridge  predicates  [5],  conjunctions  of 
equality  predicates  over  program  variables  from  pairs  of  program 
executions. 

There  is  a  rich  literature  on  generating  specifications/invariants 
for  sequential  programs.  Daikon  [11]  automatically  infers  likely 
program  invariants  using  statistical  inference  from  a  program's  ex¬ 


ecution  traces.  Csallner  et  al.  [8]  propose  an  approach,  called  DySy, 
that  combines  symbolic  execution  with  dynamic  testing  to  infer 
preconditions  and  postconditions  for  program  methods.  Hangal  and 
Lam  [18]  propose  DIDUCE,  which  uses  online  analysis  to  discover 
simple  invariants  over  the  values  of  program  variables.  Deryaft  [24] 
is  a  tool  that  specializes  in  generating  constraints  of  complex  data 
structures.  Logozzo  [22]  proposed  a  static  approach  that  derives 
invariants  for  a  class  as  a  solution  of  a  set  of  equations  derived 
from  the  program  source.  Houdini  [15]  is  an  annotation  assistant 
for  ESC/Java.  It  generates  a  large  number  of  candidate  invariants 
and  repeatedly  invokes  the  ESC/Java  checker  to  remove  unprov- 
able  annotations,  until  no  more  annotations  are  refuted. 
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A.  Correctness  of  Inference  Algorithm 

Let  T  be  a  set  of  parallel  execution  traces  of  program  V ,  and  Focus 
be  a  set  of  focus  variables.  Let  X  be  a  satisfying  solution  to  the  SAT 
formulation  in  Section  4.1,  and  S*  contain  the  set  of  statements 
to  enclose  with  if(*)  in  the  inferred  NDSeq  specification,  i.e., 

s  £  S*  iff  Xs  =  1. 


Note  that,  we  have  already  proved  (in  [4] )  Theorem  2,  which 
says  that  checking  conflict  serializability  of  each  trace  r  in  T  by 
only  considering  Relevant  (r,  5*)  is  sound:  if  we  find  no  conflict 
cycles  after  omitting  irrelevant  events,  then  r  is  safe  with  respect  to 
the  inferred  NDSeq  specification.  Therefore,  in  the  following  it  is 
enough  to  show  that  the  solution  to  the  SAT  formulation  gives  us  a 
superset  of  Relevant  (r,  S*),  that  is,  if  e  £  Relevant  (r,  S*),  then 
the  solution  indicates  that  e  is  relevant. 

Lemma  2.  Let  T  be  a  set  of  parallel  execution  traces  of  program 
V.  Let  S*  be  a  set  ofif(*)  inferred  by  the  above  algorithm,  given 
T  and  focus  variables  Focus.  Let  S*  correspond  to  solution  X  of 
the  constraint  system  built  by  the  inference  algorithm. 

For  each  event  e  in  r  £  T,  if  e  £  Relevant  (t,  S*),  then 
Xe  =  0. 

Proof.  Recall  the  conditions  R1-R4  in  Definition  5  for  an  event 
e  £  r  to  be  in  the  set  Relevant(T,S*).  Given  these  condi¬ 
tions,  think  of  an  iterative  procedure  to  compute  the  relevant 
events:  Relevant (r,S*)  is  initialized  to  an  empty  set,  and  at  each 
step,  one  of  the  rules  R1-R4  is  applied  to  add  a  new  event  to 
Relevant until  Relevant (t,S*)  does  not  change.  We  do 
the  proof  by  induction  on  the  length  of  this  iteration.  The  base  case 
where  Relevant  (r,S*)  —  0  is  trivial.  In  the  following,  we  do  a 
case  split  on  the  condition  that  causes  event  e  to  be  added  to  set 
Relevant(r ,  S*). 

R1  In  this  case,  e  is  the  last  write  to  a  focus  variable  by  one  of  the 
threads.  Therefore,  by  Constraint  (A),  Xe  is  always  substituted 
by  0. 

R2  In  this  case,  e  is  a  branch(l)  event  and  NdBlock[e )  is  un¬ 
defined,  i.e.,  e  is  not  generated  by  a  statement  enclosed  with 
if(*).  Consider  each  XEs  term  in  Constraint  (B.l)  for  e: 

(A'e  =►  \/  XEs) 

eeEs 

For  each  dynamic  execution  Es  such  that  e  £  Es,  s  is  not 
enclosed  with  if(*);  otherwise  NdBlock(e)  would  be  defined. 
Thus,  it  must  be  the  case  that  s  S*  and  Xs  =  0.  Therefore, 
by  Constraint  (B.3),  each  such  X_Ba  is  0.  Finally,  by  Constraint 
(B.l),  we  have  Xe  =  0. 

R3  In  this  case,  e  is  a  branch{l )  event,  NdBlock(e)  =  Es,  and 
there  is  an  event  e!  £  Es  such  that  e!  £  Relevant By 
inductive  hypothesis,  Xe/  =  0.  To  reach  a  contradiction,  as¬ 
sume  that  Xe  =  1,  and  consider  each  XEi  term  in  Constraint 
(B.l)  fore: 

(Xe  =>  \/  Xs,iB,) 

e£E' , 

To  satisfy  this  constraint,  there  must  be  at  least  one  dynamic 
execution  E'a,  such  that  XE>  =  1.  Moreover,  by  Constraint 
(B.3),  for  every  e"  £  E's Xe//  =  1  must  hold.  Thus,  E's, 
cannot  contain  e!  for  which  Xe/  =  0. 

Since  XB/  =  1,  Constraint  (B.2)  ensures  that  Xs/  =  1,  so  s' 
is  enclosed  with  if(*).  Note  that,  by  definition  of  NdBlock(e ), 
s  is  also  enclosed  with  if(*).  By  Constraint  (D.5).  statements 
s  and  s'  cannot  overlap;  either  Es  and  E's,  are  disjoint  or  one 
contains  the  other.  And  Ea  and  E's,  are  not  disjoint,  because 
e  is  in  both.  Again  by  definition  of  NdBlock(e),  Ea  is  the 
smallest  nondeterministic  branch  containing  e,  so  it  is  the  case 
that  Ea  C  E's/.  Since  e!  £  E,  e'  £  E'  must  hold.  This 
contradicts  with  our  assumptions  that  E'  cannot  contain  e' . 

R4  In  this  case,  e  e'  for  some  e'  already  in  Relevant.  There¬ 
fore,  by  inductive  hypothesis,  Xe/  =  0.  Constraint  (C.l)  en¬ 
sures  that  Xe  =  0. 


□ 

Theorem  3  (Inference).  Given  a  set  T  of  parallel  execution  traces 
of  program  V  and  focus  variables  Focus,  assume  that  our  SAT 
instance  is  satisfiable,  and  let  S*  be  the  set  of  statements,  inferred 
from  the  solution,  to  be  enclosed  with  if(*).  Then,  every  trace  in  T 
is  safe  with  respect  to  Focus  and  the  inferred  NDSeq  specification. 

Proof.  Suppose  some  r  £  T  is  not  safe  with  respect  to  the  inferred 
NDSeq  specification.  That  is,  there  exist  events  C  =  ei , . . . ,  e/t 
that  are  all  relevant  and  that  form  a  cycle  of  conflicts  between 
the  threads  of  r.  Note  that  the  ei, ...  ,ek  are  all  events  of  type 
“x  =  x'”.  (A  branch(l)  event  can  not  appear  in  a  cycle  as  l  is  a 
local  variable.) 

The  inferred  if(*)  locations  S*  correspond  to  a  solution  X 
to  the  constraint  system  built  by  our  inference  algorithm.  By 
Lemma  2,  because  the  ei , . . . , e&  are  all  relevant,  we  have: 

Xei  -  Xe2  =  •  •  •  =  Xefc  =  0 

But  this  contradicts  Constraint  (E) — that,  because  ei, . . . ,  e*  form 
a  conflict  cycle,  the  solution  X  must  satisfy  the  following  con¬ 
straint: 

(Xei  VX£!  V'"VXeJ 

Therefore,  for  each  conflict  cycle,  at  least  one  event  in  the  cycle 
must  be  marked  irrelevant  in  the  solution  X.  Theorem  2  states  that 
in  this  case  all  the  traces  are  safe  because  there  exists  no  conflict 
cycles  with  all  relevant  events.  □ 

B.  Using  Dynamic  Slicing 

To  introduce  the  dynamic  slicing,  we  need  to  define  the  control 
dependence  relation  °->T  between  events  of  a  trace  r. 

Cl  (Control  Dependence).  For  a  branch  evente;  :  branch(l)  and 
any  event  ej,  we  add  a  ej,  if  e,  is  control  dependent  on 
d.  Note  that  a  is  control  dependent  on  ej  if  and  only  if 

1.  d  is  the  branch{l)  event  generated  from  the  execution  of  a 
if  (l)  s  or  while(Z)  s, 

2.  ej  is  generated  by  the  execution  of  a  statement  s'  contained 
in  the  nested  s  statement,  and 

3.  no  other  conditional  or  loop  in  s  contains  the  statement  s'. 

Let  — >*  denote  the  transitive  closure  of  (--+r  U  »T). 

Dynamic  Slice  A  dynamic  slice  of  a  trace  with  respect  to  the 
focus  variables  is  computed  as  follows.  We  first  compute  the 
set  Target(r)  =  {e:g  =  l£r\g£  Focus  A 
e  is  last  write  to  g  in  Thread  (e)},  i.e.  the  set  of  all  events  that  di¬ 
rectly  affect  the  final  output.  Then  a  dynamic  slice  of  a  trace 
r,  denoted  by  DSlice(r),  is  the  set  {e  £  r  |  3e'  £ 

Target(r)  such  that  e  — >*  e1},  i.e.  the  set  of  all  events  that  di¬ 
rectly  or  indirectly  affect  the  final  output. 

The  following  observation  enables  us  to  perform  the  optimiza¬ 
tion  given  in  Section  4.3  to  improve  the  efficiency  of  solving  our 
MinCostSAT  instances. 

Lemma  1.  Given  an  input  trace  t  £  T  to  our  SAT  formulation, 
a  dynamic  slice  of  r  is  always  a  subset  of  the  set  of  relevant 
events  given  by  a  solution  ( independent  of  the  inferred  NDSeq 
specification).  That  is,  VS*.  DSlicefir)  C  Relevant(r,S*). 


Proof.  Note  that  both  DSlice{r)  and  our  SAT  instance  in  Sec¬ 
tion  4.1  use  the  same  set  Focus  of  focus  variables  given  by  the 
user.  Let  X  be  a  solution  to  our  SAT  formulation.  We  show  that  if 
e  £  DSlicefr),  then  e  cannot  be  marked  irrelevant,  i.e.,  Xe  =  0. 

e  £  DSlice(r)  holds  if  either  e  is  the  last  write  to  some 
g  £  Focus  or  e  — >*  e'  for  some  e'  £  DSliceir).  In  the  former 
case  (e  £  DSlicelyr)),  Part  (A)  of  the  SAT  formulation  ensures  that 
Xe  =  0.  For  the  latter  case  (e  — >*  e'  and  e!  £  DSlicefir))  we 
do  a  proof  by  induction  on  the  length  of  — >.  For  the  induction, 
we  assume  that  e'  is  marked  relevant,  so  Xe/  =  0.  In  the  base 
case  e  =  e',  and  thus,  X,.  =  0.  In  the  inductive  case,  we  rely 
on  the  contrapositive  form  of  Lemma  3  (proved  below):  if  e'  is 
marked  relevant,  either  e  is  marked  relevant  or  e  — >*  e'  does  not 
hold.  □ 

Lemma  3.  Given  an  input  trace  r  £  F  to  our  SAT  formulation 
and  a  NDSeq  specification,  if  there  exists  two  events  e  and  e!  in  r 
such  that  e  is  irrelevant  and  e  — >*  e! ,  then  e'  must  be  irrelevant. 

Proof.  We  prove  the  contrapositive  form  of  the  statement.  Assume 
that  e!  is  marked  relevant,  i.e.,  Xe/  =  0,  and  e  — e' .  We  prove 
that  e  is  marked  relevant,  i.e.,  Xe  =  0,  in  the  solution. 

We  do  induction  on  the  length  of  — >*.  The  base  case  when 
e  =  e'  is  trivial.  Now  assume  that  e  — >  e"  — >*  e'  for  some  e" . 
By  inductive  hypothesis  e"  is  also  marked  relevant,  so  Xe//  =  0. 
By  definition  of  — >,  either  e  --•»  e"  or  e  c— >  e" .  In  the  former 
case,  Constraint  (4)  ensures  that  Xe  =  0  (thus  the  relations  in 
D Slice (r)  due  to  D1  and  D2  are  respected  by  the  solution). 

Now  assume  the  latter  case:  e  <— >  e" .  In  this  case  e  =  branch(l) 
for  some  l.  To  reach  a  contradiction,  let  us  assume  Xe  =  1.  In 
this  case,  Constraint  (1)  ensure  that  there  is  some  X_gs  =  1  for 
some  statement  s  enclosed  with  iff*),  thus  Xs  =  1  also  holds.  In 
addition,  by  Constraint  (3),  Xe///  =  1  must  hold  for  all  e"  £  E. 
By  definition  of  •— >  and  that  if(*)s  are  structured,  e"  must  be 
in  E,  and  thus  Xe//  =  1,  contradicting  with  our  assumption  that 
Xe/f  =  0.  Therefore,  for  each  solution  X  to  SAT,  if  Xe/  =  0  and 
e  — >*  e' ,  then  Xe  =  0  holds.  □ 

C.  Incorporating  Functional  Correctness 

Note  that,  although  the  formulation  above  aims  to  add  only  the  min¬ 
imum  number  of  necessary  if  (*)s  to  the  inferred  NDSeq  specifica¬ 
tion.  the  risk  of  violating  the  functional  specification  remains.  We 
noticed  that  if  a  minimal  solution,  say  S*.  i.e.,  a  solution  of  the 
constraint  system  that  optimizes  objective  (7),  violates  functional 
correctness,  then  we  can  add  the  following  constraint  to  the  con¬ 
straint  system  to  avoid  the  solution: 

V  (8) 

ses* 

and  solve  the  resultant  constraint  system  again  while  optimizing 
the  objective  (7).  Notice  that,  Constraint  (8)  plays  the  same  role  as 
a  “conflict  clause”  in  the  SAT  terminology  and  prevents  the  solver 
to  assign  1  to  all  variables  Xs  for  s  £  S*  within  the  same  solution. 
We  can  repeat  this  process  until  we  find  a  solution  that  satisfies  both 
functional  correctness  and  passes  serializability  check  or  we  find  no 
solution  or  we  run  out  of  time.  If  we  find  a  solution  in  this  iterative 
process,  then  we  have  inferred  a  likely  NDSeq  specification  of  the 
parallel  program.  If  we  find  no  solution,  then  probably  there  is  no 
NDSeq  specification  for  the  parallel  program. 


